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Abstract. The underlying reality of a succession of interbreeding popu- 
lations is a vastly complicated network N. Since Darwin, species trees have 
^ 2 ' been used as a simplified description of the relationships which summarize the 

(^ I overly complicated network N. Recent evidence of hybridization and lateral 

gene transfer, however, suggest that there are situations where trees are inad- 
equate. Consequently it is important to determine properties that characterize 
networks closely related to N and possibly more complicated than trees but 
lacking the full complexity of N. 

A connected surjective digraph map (CSD) is a map / from one network TV 
^^ I to another network M which either collapses an arc to a single point or takes 

an arc to an arc, which is surjective, and such that the inverse image of a point 
Cn I is always connected. CSD maps are shown to behave well under composition. 

1/^ ' If there is such a CSD map, the network M is shown to arise naturally as a 

^^ I quotient structure from A^ . It is proved that if there is a CSD map from N to 

^^ ■ M, then there is in a way to lift an undirected version of M into N, possibly 

with added resolution. A CSD map from iV to M puts strong constraints on 
N; if the map were not connected, there would be minimal constraints. 

A procedure is defined, given N, to construct a standard successively cluster- 
5_^ ■ distinct network from N. In general, it may be useful to study classes of net- 

C^ I works such that, for any N, there exists a CSD map from N to some standard 

member of that class. 

Keywords: digraph; network; connected; hybrid; phylogeny; homomorphism 

1 Introduction 

Since Darwin, phylogenetic trees have been used to display relationships among 
species, and they have become a standard tool in phylogeny. More recently, in 
order to deal with the possibilities of such events as hybridization and lateral 
gene transfer, more general phylogenetic networks have become of interest [15], 



[17], [9], [6], [4], [16]. Different researchers have found it useful to make a broad 
range of assumptions about the networks in order to be able to obtain various 
results. 

The underlying reality for, say, successive sexually reproducing populations 
of diploid organisms, is a complicated network N of parents and children of 
individual organisms — a full genealogy reaching back to ancestors in the remote 
past. Trying to reconstruct such a reality from extant taxa is a hopeless goal. 
Instead, we have often relied on a species tree T as a phylogeny at a more 
abstract level. In principle, the underlying complicated network TV has been 
usefully transformed into the much simpler species tree T. 

This paper explores relationships between N and other related networks M, 
potentially much simpler than N, but perhaps more complicated than trees. 
Other researchers have looked at similar problems. General frameworks for net- 
works are discussed in [1], [2], [15], and [17]. Typically these frameworks model 
phylogenies by acyclic rooted directed graphs. Wang et al. [21] and Gusfield 
et al. [11] study "galled trees" in which all recombination events are associ- 
ated with node-disjoint recombination cycles. Van lersel and others generalized 
galled trees to "level- /c" networks [14]. Baroni, Semple, and Steel [2] introduced 
the idea of a "regular" network, which coincides with its cover digraph. Cardona 
et al. [5] discussed "tree-child" networks, in which every vertex not a leaf has 
a child that is not a reticulation vertex. Dress et al. [10] consider alternative 
ways to derive trees, or, more generally, hierarchies from a network. Moret et 
al. [15] define a reduction R{N) of a network A^ of use in analyzing displayed 
trees. 

Let N and M be phylogenetic X-nctworks. Such networks are rooted di- 
rected graphs with specified leaf set X. (Further details are given in section 2). 
The basic tool studied in this paper is that of a connected surjective digraph 
(CSD) map f : N ^ M . A formal definition is in section 3, but, roughly, such 
a map / is a map on the vertex sets, / : V{N) —> V{M), satisfying 

(1) / is onto; 

(2) whenever {u,v) is an arc of N, then either {f{u),f{v)) is an arc of M, or 
else f{u) = f{v), and every arc of M arises in this manner; 

(3) for each vertex v' of M, f^^(v') consists of the vertices of a connected 
subgraph of N. 

CSD maps are special cases of graph homomorphisms, which have been the 
subject of recent investigations, including a recent book [13] by Hell and Neseti^il. 
A review of graph homomorphisms, especially with applications to colorings, is 
in Hahn and Tardif [12]. These studies do not include studies of homomorphisms 
with property (3). Work by Daneshgar et al. [7] concerns "connected graph 
homomorphisms" but with a very different notion of connectedness, requiring 
that the inverse image of an edge be empty or connected. 

Figure 1 shows a network A^ and a network A^' which happens to be a tree. 
There is a CSD map f : N —> N' . Each vertex w in A'^ is labelled by the name 
of the vertex f{v) in A^'. The set of leaves, corresponding to extant taxa, is 
X = {1,2,3,4}. In this particular case, the tree A^' is a plausible candidate for 
the "species tree" corresponding to N. 



The networks M for which there is a CSD map from N to M are seen in 
section 3 to arise as certain quotient structures of A'^ in a natural way. 
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Figure 1: Two X-networks N and A^', in which N' happens to be a tree. There 
is a CSD map / from N to N' , given by the labelhng of vertices in N. A certain 
tree T displayed by N is shown in bold. In fact, section 5 shows N' — ClDis{N). 



The condition that there be a CSD map f : N —> N' is very different from 
the condition that N' be displayed by N; i.e. that N contain a directed X- 
subgraph isomorphic with N'. UN is the network in Figure 1, there is a CSD 
map / from N to the tree N' with topology (1,(2,(3,4))). It is true that the tree 
(1, (2, (3,4))) is displayed in N. But another tree T with topology (1,(4,(2,3))) 
shown in bold in Figure 1 is also displayed in N, yet there is no CSD map from 
N to T. If we restrict the map / to the tree T to yield the map f\T, then f\T 
remains a surjective digraph map from T onto N' , but it is not connected since 
the preimage of vertex 34 is no longer connected. 

The essential condition for a CSD map / : A^ ^> M is (3), that for each 
vertex w of M the points of N mapping to v induce a connected subgraph of N. 
In retrospect, this condition appears natural: The essential topological property 
of a single point is that it is connected, i.e., all in one piece. The essential bio- 



logical property of a single population is that it is connected, since an organism 
arises only from another organism. In order to find natural relationships among 
networks TV and M, we assume here that the points of N corresponding to a 
single vertex in M should therefore also be connected. 

In this paper it is proved (Theorem 4.1) that whenever f : N —¥ N' is a 
CSD map, then N' can be "lifted" into N in many ways, each called a wired lift 
in this paper. Any wired lift is an undirected subgraph of N resembling N' but 
possibly containing more resolution. Thus some aspects of N' also are exhibited 
in N. The fact that each f^^(v') is connected is essential to this possibility. 
More generally, if / : A^ — !■ iV' is a CSD map, then TV' places strong constraints 
on the structure of N. In contrast, it is shown (Theorem 4.5) that without the 
connectedness property, the constraints on N would be minimal. 

Suppose X denotes the leaf set of the networks, corresponding to the set 
of extant species on which measurements may be made. Following [2] define 
the cluster of a vertex v in the network TV, denoted d{v,N), to be the set of 
members of X which are descendents of v. A network N is called successively 
cluster- distinct if, whenever {u,v) is an arc of N, then cl(u,N) ^ cl(v,N). 

In section 5, given A^ we show how to construct a well-defined network 
ClDis{N) which is successively cluster-distinct and such that there is an CSD 
map / : A' ^' ClDis{N). For example, if A is the network in Figure 1, then 
A^' = ClDis(N). The network ClDis{N) potentially is vastly simpler than A, 
although it need not be a tree in general. The wired lift of ClDis{N) into A^ 
shows that in some sense ClDis{N) can act as a "skeleton" of A^. It is shown 
(Corollary 5.4) that ClDis{N) has a "universal" property making it the best 
cluster-distinct network related to A^. This raises possible interest in the study of 
successively cluster-distinct networks as a tool for studying general phylogenetic 
networks. 

Section 6 discusses some implications of these results. 

2 Fundamental Concepts 

A directed graph or digraph N — (V, A) consists of a finite set V of vertices and 
a finite set A of arcs, each consisting of an ordered pair (u, v) where u ^ V , 
V ^ V, u ^ V. Sometimes we write V{N) for V. We interpret {u, v) as an 
arrow from m to w and say that the arc starts at u and ends at v. There are no 
multiple arcs and no loops. If (u, v) e A, say that w is a parent of v and v is 
a child of u. A directed path is a sequence uq, ui, • • • , Ufe of vertices such that 
for i = 1, • • • , fc, (ui_i. Mi) e A. The path is trivial if fc = 0. Write u < v ii 
there is a directed path starting at u and ending at v. The digraph is acyclic 
if there is no nontrivial directed path starting and ending at the same point. If 
the digraph is acyclic, it is easy to see that < is a partial order on V. 

The digraph (V, A) has root r if there exists r G V such that for all v ^ V, 
r < V. The graph is rooted if it has a root. 

The indegree of vertex u is the number oi v G V such that {v,u) € A. The 
outdegree of u is the number oi v G V such that (u, v) £ A. If (V, A) is rooted 



at r then r is the only vertex of indegree 0. A leaf is a vertex of outdegree 0. 
A normal (or tree-child) vertex is a vertex of indegree 1. A hybrid vertex (or 
recombination vertex or reticulation node) is a vertex of indegree at least 2. 

Let X denote a finite set. Typically in phylogeny, X is a collection of species. 
An X -network {V, A, r, X) is a digraph G = (V, A) with root r such that 

(1) there is a one-to-one map (p : X ^ V such that the image of </) is the set of 
all leaves of G, and 

(2) for every v ^V there is a leaf u and a directed path from v to u. 

Thus the set of leaves of G may be identified with the set X and every vertex 
is ancestral to a leaf. 

In biology most X-networks are acyclic. The set X provides a context for G, 
giving a hypothesized relationship among the members of X. For convenience, 
we will write x for the leaf (^{x). It is quite common also that an outgroup 
r' is used to identify the location of the root. When this happens, there is a 
particular leaf r' e X with indegree 1 such that (r, r') is an arc and is the only 
arc ending at r' . 

An X-tree is an X-nctwork such that the underlying digraph is a rooted 
tree. 

Let N = {V,A,r,X) and N' = {V',A',r',X) be X-networks. An X- 
isomorphism -ip : N ^ N' is a map tp : V ^ V such that 

(1) -0 : ^ ^ ^' is one-to-one and onto, 

(2) iPir) = r\ 

(3) for each x e X, tp{x) — x, 

(4) {■ip{u),ip{v)) is an arc of N' iff (u, v) is an arc of N. 

We say N and N' are isomorphic if there is an X-isomorphism ip : N ^>- N' . 

A graph (or, for emphasis, an undirected graph) (V, E) consists of a finite 
set V of vertices and a finite set E of edges, each consisting of a subset {f i, V2} 
where vi and V2 are two distinct members of V. Thus an edge has no direction, 
while an arc has a direction. If w e T^, then the total degree of u is the number 
of edges in E containing u. If G = {V, E) is a graph and VF is a subset of V, 
the induced subgraph G[W] is the graph (iy,i?[VF]) where the edge set -E[IF] is 
the collection of all {wi,'y2} m E such that vi & W and V2 S W. Thus G[W] 
contains all edges both of whose endpoints are in W. 

If G = {V,E) is a graph and {wi,W2} is an edge, then a new graph G' = 
(V' , E') may be obtained by adding a new vertex W3 ^ V , removing {vi, W2} and 
adding two new edges {wi,W3} and {w2,V3}. Thus the new vertex W3 has total 
degree 2 in G'. We say that G is obtained from G' by suppressing the vertex 
v^ of total degree 2 and G' is obtained from G by inserting the vertex ^3 of 
total degree 2. We say that G and G" arc homeomorphic if there is a sequence 
G = Go, Gi, • • • , Gfc of graphs such that for « = 1, • • • , fc, G^ is obtained from 
Gj_i either by inserting a vertex of total degree 2 or by suppressing a vertex of 
total degree 2. 

A graph G = (V, i;^) is connected if, given any two distinct v and ui in 
V there exists a sequence v = wo:'f'ij''^2j • • • j"*^*; = w of vertices such that for 
i = 0, • • • , fc — 1, {wi, Wi+i} G E'. A subset W oi V is connected if the induced 
subgraph G[W] is connected. 



Given a digraph G ^ {V,A) define Und{G) = {V,E) where E = {{u,v} : 
there is an arc (u,v) e A}. Then Und{G) is an undirected graph with the 
same vertex set as G and with edges obtained by ignoring the directions of arcs. 
A subset H^ of y is connected if Und{G)[W] is connected. Thus a connected 
subset of G is defined ignoring the directions of arcs. 

3 Connected Surjective Digraph Maps 

Let TV = {V,A,r,X) and TV' = {V',A',r',X) be X-networks whose leaf sets 
are identified with the same set X. An X- digraph map f : N ^ N' is a map 
f -.V ^V such that 

(a) fir) = r', 

(b) for all X <E X, f{x) = x, and 

(c) if {u, v) is an arc of A^, then cither f{u) = /(f) or else (/(u), /(«)) is an arc 
of A^'. 

Call / connected if for each v' G V , f^^{v') is a connected subset of A^, 
i.e., if the induced subgraph Und{N)[f~^{v')] is connected. Call / surjective if 
for each v' € V, f~^{v') is nonempty and for each arc (a, 6) of A^' there exist 
vertices u and w of A^ such that {u,v) is an arc of A^, f{u) — a, and f{v) — b. 
The kernel of / is the partition {{f~^{v')} : v' e V'} of V. 

We are interested primarily in A-digraph maps that are both connected and 
surjective. They will be called connected surjective digraph maps or CSD maps. 
Many of their properties are analogous to properties of homomorphisms [13] but 
properties involving the leaf set X and connectivity require special attention. 

Let A^ — (y, A,r,X) be an A-network, where (p : X ^ V gives the identifica- 
tion. Suppose ~ is an equivalence relation on V. Let [v] denote the equivalence 
class oi V G V. The equivalence relation ~ is called leaf-preserving provided 
that for every x G A whenever u G [x] and {u,v) is an arc, then w G [x]. 

Let N = (V, A, r, X) be an A-network. Suppose ^ is an equivalence relation 
on V. Let P — {[v] : v ^ V} be the partition of V into equivalence classes. 
Define the quotient digraph N' by A^' = (V ,A' ,r' ,X) where 
(i) V is the set of equivalence classes [v] . 

(iii) The member x G A" corresponds to [x]; i.e., the identification is given by 
0':X^y'by(/.'(x) = [0(x)]. 

(iv) Let [u] and [v] be two equivalence classes. There is an arc {[u], [v]) G A' iff 
[u] ^ \v\ and there exists u' G \u\ and v' G [w] such that (u',w') G A! . 
Alternative notations for A^' will be N j ^ or X jV . 

Theorem 3.1. Let N = {V,A,r,X) be an X-network. Suppose ^ is a leaf- 
preserving equivalence relation on V . Let N' — N/ ^ — (V' , A' ,r' , X) be the 
quotient digraph. Then 

(1) N' is an X-network. 

(2) The natural map (p : N ^f N' given by 4>{u) = [u] is a surjective X-digraph 
map with kernel the set of equivalence classes under ^. 

(3) If each equivalence class [u] is connected in N , then (j) is connected. 
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Proof. (1) It is immediate that {V' , A') is a directed graph with no loops and no 
multiple arcs. If mq, mi, • • • ,Uk is a directed path in N (so for i = 0, • • • , fc — 1, 
(uijUi+i) e A), then [uq], [ui], • • • , [wfc] is a sequence of vertices in N' and for 
each i = 0, • • • , fc — 1, either [ui] ~ [ui+i] or else {[ui], [ui+i]) G A'. It follows 
that r' is a root of A^'. 

Suppose a; G X; we show that [x] is a leaf of N' . Suppose there is an arc 
{[x], [y]). Then there exist a E [x] and b e [y] such that (a, 6) G A. Since ~ 
is leaf-preserving, b € [x] so [a;] = [y], contradicting that there are no loops in 

(y',A'). 

Conversely, suppose that [u] is a leaf oi N'; I claim that there exists x E X 
such that [u] = [x]. If not, then no vertex of N in [u] is a leaf, since ~ is 
leaf-preserving. Since N is an X-network, we may choose a directed path in A'' 
starting at u to some leaf x. Since a; is a leaf, x ^ [u], so N' has an arc from [u] 
to some other vertex, contradicting that [u] is a leaf. 

Finally, given a vertex [u] € V' , note that there is a leaf x E V such that iV 
contains a directed path from u to x; it follows that in N' there is a directed 
path from [u] to [x]. 

(2) We check the conditions (a), (b), and (c) for being an X-digraph map. 
Condition (a) is immediate. For (b), note that ii x E X, then (j>{x) = [x]. To see 
(c), suppose (u, v) is an arc of N. Then either [u] = [v] or else {[u], [v]) is an arc of 
N' . To see surjectivity, it is immediate that (/)~"'^([u]) — [u] is nonempty. Given 
an arc {[u], [v]) of N' there exist u' E [u] and v' E [v] such that {u',v') E A, but 
then 4'{u') = [u] and 4'{v') = [v]. 

(3) follows since (/)^^([u]) ~ [u]. D 

If N is acyclic, it need not follow that N' is also acyclic. An obvious necessary 
condition for N' to be acyclic, however, is that each equivalence class be closed 
under directed paths in N. More precisely, if B is an equivalence class as in 
Theorem 3.1, u and v are in B, and there is a directed path u = mq, mi, • • • , Wfc = 
f in A^ from u to u, then we must have {uq, ui, U2, • • • , Uk} C B. 

The following converse shows that the image of a surjective digraph map is 
essentially the same as the natural quotient digraph. 

Theorem 3.2. Let N ^ {V,A,r,X) and N' ^ {V',A',r',X) be X -networks. 
Suppose f : N -^ N' is a .surjective X-digraph map. Define the relation ^ on 

V by u '^ V iff f{u) = f{v). Then ^ is a leaf-preserving equivalence relation 
and the equivalence classes are [u] ~ f^^{f{u)). Moreover the quotient digraph 
N/ ^ is isomorphic with N' via the map (j) ■ N/ ^ —> N' given by (t>{[u\) ~ f{u). 

Proof. It is immediate that '^ is an equivalence relation and that is one-to- 
one and onto. To see that it is leaf-preserving, suppose x E X, u EV satisfies 
u E f~^(x), and v E V satisfies that {u,v) is an arc. We must show that 

V E f^^{x). But since / is a digraph map, either f{u) = f{v) or {f{u),f{v)) is 
an arc. In the former case f{v) = f{u) = x; in the latter case there is an arc 
from f{u) = X to f{v), contradicting that x is a leaf in A^'. 

If {[u], [v]) is an arc of N/ ^ then there exist u' E [u] and v' E [v] such that 
(w', v') is an arc of A^. Since f{u') ^ f{v') and / is an X-digraph map it follows 



{f{u'),f{v')) is an arc of N'. Conversely, suppose {a,b) is an arc of N' . Since 
/ is surjective there exist vertices u and w of iV such that {u, v) is an arc of A'', 
f{u) = a, and f{v) = b. Since a ^ 6 it follows [u] ^ [v], so {[u], [v]) is an arc of 
N/ ^ which satisfies that (/)([u]) = a and 0([w]) ~ b. D 

The connectedness of the inverse images of points implies the connectedness 
of the inverse images of more general connected sets: 

Theorem 3.3. Let N = (V,A,r,X) and N' = {V\A\r\X) be X-networks. 
Let f : N ^ N' be a CSD map. IfBCV' is connected in N' , then /"^(B) is 
connected in N . 

Proof. Write B = {v[,v'2,--- ,<}• Then f-^B) = U[f-^v[) : i = I,--- ,fc]. 
Since B is connected, there exist arcs (w^. , w^. ) for i = 1, • • • , ?Ti such that these 
arcs connect together the members of B. Since / is surjective, for each i there 
exist vertices Va^ G f~^{v'aj and vi,^ G /^-^(w^.) such that (wai,WbJ G A. But 
now since each set f~^{vl) is connected, it follows that f^^{B) is connected. D 

Theorem 3.4. Let N and N' be X-networks. Let f : N ^ N' and g : N' ^ N" 
be X -digraph maps. 

(a) The composition g o f : N ^ N" is an X -digraph map. 

(b) If f and g are surjective, then g o f is surjective. 

(c) If f and g are connected and surjective, then go f is connected and surjective. 

Proof, (a) and (b) are immediate. For (c), suppose / and g are connected and 
surjective. From (b), gof is surjective. For any vertex v" oi N", {gof)~^{v") = 
/^^(g~^(u")). Since g is connected, g~^{v") is connected. But then by Theorem 
3.3 since / is connected, f~^{g~^{v")) is connected. D 

It follows that the composition of any number of CSD maps is also a CSD 
map. The network which is the image of the last map is thus a quotient digraph 
of the first network. 

We next show that in certain circumstances a CSD map / can be factored 
as f = ho g, where g and h are CSD maps. 

Suppose N ~ (V, A, r, X) is an X-network. A partition Q of T^ is subordinate 
to a partition V oi V provided, for each yl G Q, there exists B E P such that 
ACB. 

Theorem 3.5. Let N = {V,A,r,X) and N' = {V',A',r',X) be X-networks. 
Let f : N ^f N' be a surjective X -digraph map with kernel V = {f^^{v) '■ v G 
y}. Suppose Q is a partition of V that is .subordinate to V. 

(1) There exist surjective X-digraph maps g : N ^f N/Q and h : N/Q — !> N' 
such that f = ho g. 

(2) If in addition f is connected and each member of Q is connected, then both 
h and g are connected. 

Proof. (1) Write [v]q for the member of Q that contains vertex v. Define g by 
g{v) — [v]q. If [v]q is a vertex of N/Q define /i([w]g) = [v]'p- Note that if 
[vi\q — [v2]q, then vi and V2 are in the same member of the partition, whence 



because Q is subordinate to V we have [vi\-p — [v2]v- Hence both g and h are 
weU-defincd. Moreover, [h o g)[v) — h(g{v)) = ft.([f]g) = [v]-p = f{v) using 
Theorem 3.1. 

Since / is surjective, for each v' £ V' there exists v e V{N) such that 
f{v) — v' . Hence ft.([w]g) — v' and g{v) = ([wji^) so h and g are surjective as 
maps of sets. If {u',v') is an arc of A''', then since / is surjective there exist 
vertices u and v oi N such that f{u) = u' , f{v) ~ v' , and {u, v) is an arc of N. 
Hence {g{u),g{v)) is an arc of N/Q and h{g{u)) = u' , h{g{v)) ~ v' in N', so h 
is surjective. Moreover, g is surjective by Theorem 3.1. 

For (2) suppose / is connected and each member of Q is connected. Each 
vertex of N/Q is a subset B oi V ior B € Q. By hypothesis B is connected, 
so it follows that g is connected. Next suppose v G V'; since / is surjective, 
pick w e f~^{v). Then h~^{v) is the image in N/Q of [w]-p. But [w]p is 
connected since / was connected, so its image in N/Q is also connected. Hence 
h is connected. D 

4 Wired lifts 

The next result, Theorem 4.1, shows that when / : iV — !> iV' is a CSD map, then 
in a certain sense the network TV' can "almost" be identified as a subgraph in 
N. In fact, there is a "wired lift" M of TV' into TV consisting of an undirected 
subgraph M oi N which resolves TV'. In fact, there are numerous such wired 
lifts, at least one for any of a certain collection of arbitrary choices. 

More explicitly, let G" — {V , E') be an (undirected) graph with leaf set X. 
A graph G ~ (V, E) with leaf set X is a resolution of G' provided that G' is 
obtained from G by recursively contracting certain edges. In each step, an edge 
{u, v} of G is contracted by removing the edge and identifying the two endpoints 
together. No edge with an endpoint in X is allowed to be contracted. 

Every graph is a resolution of itself. 

Let TV = {V,A,r,X) and TV' = {V',A',r',X) be X-networks. Suppose 
/ : TV — ^ TV' is a surjective digraph map. A wired lift of TV' is an undirected 
subgraph M = {W, E) of Und{N) such that the following hold: 

(1) For each arc (u', v') of TV' there is exactly one arc (u, v) of TV such f{u) — u', 
f{v) = v', and {u,v} is an edge of M. The set of all edges {u,v} so obtained 
will be denoted Ei and the set of all vertices which occur in any of the arcs 
(m, v) G El will be denoted V{. Let Vi = V{ U X. 

(2) Every edge {a, b} E E either hes in Ei or else satisfies /(a) = ,f{b). 

(3) For each vertex u' of TV', let V{v') = {w e Vi : f{w) = v'}. The induced 
subgraph M[f~^{u') HW] is a, tree with leafset V{v'). 

We call El the set of nondegenerate edges of M, since the image under / of 
each such edge is an edge of TV', not just a single vertex. Note that W C V and 
E C E{Und{N)). 

Intuitively, M \s a. subgraph of Und{N) that is a resolution of Und{N') in 
that for each vertex v' of TV', [f^^{v')] n W consists of the vertices of a tree, 
all of whose vertices map to v' , not necessarily a single point. The name "lift" 



suggests that TV' is being lifted into the domain of /. 

The following theorem gives sufficient conditions for a wired lift to exist 
given any choice of Ei. The essential property is that / be connected. In order 
to have the possibility of always extending Ei to a wired lift, the inverse image 
of each vertex of A^' must be connected. 

Theorem 4.1. Let N = (y,A,r,X) and N' = {V',A',r',X) be X -networks. 
Suppose f : N ^f N' is a CSD map. For each arc {u' ,v') of N' choose an arc 
(u, v) of N such that 4>{u) = u' , (t>{v) = v' . Let Ei denote the set of edges {u, w} 
of Und{N) so obtained. Then f has a wired lift M for which Ei is the set of 
nondegenerate edges. Each such wired lift M is a resolution of Und{N'). 

Proof. We may assume that A^' docs not consist of a single vertex, so every 
vertex of N' is an cndpoint of some arc of N' . Since / is surjective, the con- 
struction of El in the statement can be carried out. Recall that Vi is the set of 
all vertices of N that arise as an cndpoint of some edge in Ei or else lie in X. 

For each vertex v' of N' , recall V{v') = {w G T^i : f{w) = w'}. Note that 
V{v') is nonempty since each vertex occurs in some arc. Since / is connected, 
the graph N^i :— Und{N)[f~^ {v')] is connected. Consequently there exists a 
subtree Tyi of Nyi that contains V{v'), for example a minimal spanning tree. 
We may assume that Tyi has no leaves except the members of V(v') by removing 
other leaves. Let V2 denote the set of all vertices that lie on any T^/, and let E2 
denote the set of all edges {u v} that lie in any Tv> . 

Define the graph M = {Vm,Em) by Vm := Vi U V2 and Em ■= E1UE2. 

I claim M is a wired lift. Each edge {u,v} in E2 is contained in V{v') for 
some v' and satisfies f{u) ~ f{v) = v' . Each edge {u,v} in Ei is such that 
either {f{u),f{v)) or (/(u),/(u)) is an arc of N'. This shows that M satisfies 
properties (1) and (2) of wired lifts. Property (3) is immediate since T(v') is a 
tree. 

Finally, M is a resolution of Und{N') since, to obtain Und{N') from M, one 
must merely contract every edge in i<^2- CH 

Observe that in the wired lift, the edges Ei are in one-to-one correspondence 
with the edges of Und(N'). All additional edges, i.e., those in E2, are such that 
both endpoints map under / to the same vertex of A'^'. Many different vertices 
of M can project to the same vertex in A^', but all those that do so form a tree. 

Even though M is an undirected graph, each of the edges {u,v} G Ei may 
be considered to have a preferred orientation of cither (u, v) or (u, u) depending 
on which is an arc of A^. 

For example, consider the networks A'^ and A^' in Figure 2. There is a 
CSD map f : N ^ N' given by f{x) = x ioi x € X, f{u) = m for u G 
{11, 12, 16, 19, 20}, and f{u) = [13] for u G {13, 14, 15, 17, 18}. A wired lift M 
consists of all edges of Und{N) except {12, 18}. Note that in M, 18 has no 
incoming directed arc from the directed graph A^, but this is not a problem 
since the wired lift M is an undirected graph. Indeed, simulations show that 
the undirected graph M can be a maximum likelihood tree for sequence data 
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Figure 2: Two X-nctworks N and N' . There is an CSD map from N to N' . 
Here X = {1, 2, 3, 4, 5, 6, 7, 8}. A wired lift M consists of all edges of Und{N) 
except {12, 18}. 



on X which arises by evolution along N. There is also a different wired lift, 
consisting of all edges of Und{N) except {12, 13}. 

The next few results show that a CSD map (j> : N ^ N' can put strong 
constraints on the structure of N. 

Corollary 4.2. Let N = {V,A,r,X) and N' ^ {V',A',r',X) be X-networks 
and let (j) : N ^ N' he a CSD map. Let U' be an (undirected) subgraph of 
Und{N') such that no vertex has total degree in U' greater than 3. Then Und{N) 
contains a subgraph U homeomorphic with U' . 

Proof. Let M be a wired lift of A^' into N. For each vertex u' of U' , there are 
at most three edges of U' with u' as one endpoint. If there are k edges, fc < 3, 
then denote them {a'i,u[}, •••, {a'i^,u'f,} with (l){u[) = ■■■ = (piu'j^). Since 4> 
is surjective, there are k edges {a^, Ui] in N for i = 1, • • • , fc, with (j){ai) = a[ 
and (f){ui) = u' . Since (^^^(u') is connected, there is a tree T„/ in <J3~^(u') with 
endpoints mi, • • • ,'Ufe. Since fc < 3, we may modify r„/ if necessary so that no 
vertex has total degree in T„/ greater than 3. Thus Und{N) contains a subgraph 
U consisting of one edge for each edge of U' together with a tree T^' for each 
vertex u' oi U' . A simple consideration of cases shows that U is homeomorphic 

with u'. n 



If U' has a vertex u' of total degree 4, then the corresponding tree T^ may 
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contain a vertex of total degree 4 but might instead contain only vertices of 
total degree 3, in which case there is no homeomorphism between U and U'. 
Effectively, it is possible that U closely resembles U' but resolves some vertices 
in U' of total degree greater than 3. 

If {a,b,c,d} C X, the quartet ab\cd is the undirected tree with leaf set 
{a, h, c, d\ in which a and h share a parent and also c and d share a parent. 

Corollary 4.3. Let N = {V,A,r,X) and N' = {V',A',r',X) he X -networks 
and let (j) : N -^ N' be a CSD map. IfUnd{N') contains a subgraph homeomor- 
phic with the quartet ab\cd, then so does Und{N). 

Consider the special case of a CSD map from A^ to a tree T. Again, the 
structure of T will be shown to put strong constraints on N . Lemma 4.4 shows 
that if N and T arc both binary X-trees, then in fact N and T are the same 
tree. 

If T is a rooted X-iree and a and b are in X, the most recent common 
ancestor of a and 6, denoted mrca(a, 6), is the common ancestor of a and b 
such that no strict descendent is also a common ancestor of a and b. If a, b, c 
are distinct members of X, we say that T contains or displays the rooted triple 
ab\c provided that the most recent common ancestor of a and c is itself a strict 
ancestor of the most recent common ancestor of a and b. 

Lemma 4.4. Let T and U be rooted X -trees. Suppose there is a CSD map 
f-U^T. 

(a) Every resolved rooted triple ab\c in T is also a rooted triple of U . 

(b) If T is binary, then U = T . 

Proof. The hypotheses mean that T and U are rooted X-trees in which there 
may be additional vertices with indcgrce 1 and outdegree 1 (which often are 
suppressed in trees). 

We first show (a). Without loss of generality we may assume that 12 1 3 is in 
T. We show 12|3 in U by considering other possibilities for {1, 2, 3} in U . 

Suppose instead that U displays 13|2. Let a — mrca(l,2) in U and b = 
mrca(l,3) in U. Let c — mrca(l,3) in T and d = mrca(l,2) in T. Since U 
displays 13[2, in U there is a directed path from 6 to 1 and a directed path from 
b to 3. It follows that in T there is a directed path from f{b) to /(I) = 1 and 
from f{b) to /(3) = 3. Hence f{b) < mrca(l,3) = c in T. It follows that the 
image of the directed path in U from 6 to 1 is a directed path in T from /(&) to 
1 which must pass through d. In particular, f~^(d) must meet the path from 
6 to 1. Similarly, in U there is a directed path from a to 2 and from a to 3. 
Hence /(a) < mrca(2,3) = c in T. It follows that the directed path in U from 
a to 2 must be mapped into a directed path in T from /(a) to 2, which must 
pass through d. Hence f~^{d) must meet the path from a to 2. 

By hypothesis / is connected, so f~^{d) is connected. Since f^^{d) contains 
a point on the path from a to 2 and also a point on the path from 6 to 1 and U 
is a tree, we see that /(a) = f{b) = d. But this contradicts that f{b) < c. This 
shows that U cannot display 13 1 2. 
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A symmetric argument shows that U cannot display 23|1. We wish to show 
U displays 12 13. The remaining possibility is that U displays the unresolved 
star 123. In this case, let a denote the star point in U . In U there is a directed 
path from a to 1 and also from a to 3. Hence in T there is a directed path from 
/(a) to 1 and /(a) to 3, so /(a) < c. In particular the path from a to 1 is taken 
to a path in T that must pass through d, so the path from a to 1 meets f^^{d). 
Similarly in U there is a directed path from a to 2. Its image in T must pass 
through d, so the path from a to 2 meets f~^{d). Since f~^{d) is connected and 
U is a. tree, it follows that /(a) = d. But this contradicts that /(a) < c. Thus 
this possibility cannot arise. This completes the proof of (a). 

Part (b) follows from (a) since a rooted tree is determined by its rooted 
triples; see [3] or [20], p. 118. D 

More generally, if /:[/—;> T is a CSD map and both networks are X-trees, 
then U possibly resolves some polytomies of T but otherwise agrees with T. 
The tree displayed in bold in Figure 1 shows that Lemma 4.4 is not true if / is 
merely surjective but not connected. 

If N' is known and f : N -^ N' is a surjective digraph map but not necessar- 
ily connected, very little information about N can be inferred. The star network 
with leaf set X and for x G X multiplicity p{x) is the directed multigraph with 
vertex set X U {r}, root r and p{x) arcs (r, x) for each x <E X; there are no other 
vertices or arcs. The following theorem shows that any acyclic X-network N' 
is the image of an X-nctwork homcomorphic to a star network by a surjective 
digraph map. Hence if / : iV — ;> iV' is a surjective digraph map that is not 
connected, then N' puts negligible constraint on the structure of N. 

Theorem 4.5. Let N' — {V' ,A' ,r' ,X) be an acyclic X -network. There exists 
an X -network N = {V, A, r, X) which is homcomorphic with a star network with 
leaf set X and a surjective digraph map f : N ^ N' . 

Proof. For each x G X, let P{x) be the collection of directed paths in A^' from r' 
to X. Suppose there are p{x) = \P{x)\ such paths where, for i = 1, • • • ,p(x) the 
i-th path has k{x, i) arcs and is given by r' = W(:r,i,o) , W(2:,i,i) ' • ' • ' V(x,iM^A) = ^■ 
Construct N with p{x) paths from r to x, with no vertices in common except 
r and x. The i-th such path has vertices r', W(a;,i,i), W(x,i,2), • • • : W(x,i,k{x,i)) = 
X. Each arc of N arises as an arc from such a path, and there are no other 
arcs. There is a surjective digraph map f : N -^ N' given by /(r) = r' and 
f(wtx,i,j)) = i'(x,i,j)- Note that N is homcomorphic to a star network with p{x) 
arcs from r to x and no other arcs. D 

See Figure 3 for an example. In fact, instead of P{x) one may use a subset 
of P{x) such that each arc of N' occurs in some path in some P{x). 

5 Successively Cluster-Distinct Networks 

Let V{X) denote the collection of subsets of X. Following [2] given an X- 
nctwork A^ = (y,A, r, X), define the cluster map cl : V ^^ ^(-^) by cl{v) = 
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Figure 3: Two X-networks N and TV'. There is a surjective digraph map / from 
N to N' given by labehng each vertex w of TV with the label of f{v) in N'. The 
map / is not connected, and N is homeomorphic to a star network. None of the 
relationships in N' between the leaves are present in N , and there is no wired 
lift of N' into N. 



{x lE X : V < x}, and call cl{v) the cluster of v. Sometimes for clarity cl{v) 
will also be denoted cl{v, N). The taxon v has the possibility of influencing the 
extant genomes for taxa in cl(v) but cannot influence the genomes of taxa not 
in cl{v). 

Call an X-network successively cluster- distinct or more briefly cluster- distinct 
if for each arc (a, b) it is true that cl{a) ^ cl{b). 

Networks which are not cluster-distinct may have many successive vertices 
in a directed path all of which have the same cluster and hence potentially leave 
genetic influence on precisely the same extant vertices (members of X). It will 
therefore be hard to distinguish their different genetic impacts on extant taxa. 
Consequently it is plausible to simplify such a network in order to highlight 
features that are more likely distinguishable. 

The following algorithm Cluster-Distinct takes as input a network N and 
essentially outputs a network ClDis{N) which is successively cluster-distinct. 
The idea is very simple. Whenever (m, v) is an arc and cl{u, N) = cl(v, N), then 
u and V are identified. Clearly {u, v} is connected in N since (u, v) is an arc. 
As a result of doing all such identifications, one obtains ClDis{N). 

Here is a more precise description of the algorithm: 

Algorithm Cluster-Distinct 

Input: N = (y, A, r, X) is a network with leaf set X . 

Output: A partition of V . 

Procedure: We construct a sequence Si of subsets of V . 

(1) Let 5*0 be the set of singleton sets from V . 

(2) Repeat recursively the following if any such step can be performed: Given 
Si, suppose distinct Bi and B2 in Si satisfy that ui e i?i, U2 G ^2 (wi,M2) is 
an arc of A^, and cl{ui, N) — cl(u2, N). Then S'i+i is found by removing Bi and 
B2 from Si and adjoining Bi U B2. Thus S^+l := (Si - {Bi, B2}) U {Bi U B2}. 
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(3) Suppose for some in, Sm has been constructed but there arc no further ways 
to perform (2). Return Sm- 

It is clear that Sm is a partition of V. Given N we denote by ClDis{N) := 
N/Sm- Call ClDis{N) the cluster- distinct network obtained from N. 

Theorem 5.1. Let N — (V, A, r, X) he an X -network. Let Sm denote the result 
oj performing Algorithm Cluster- Distinct. 

(1) N/ Sm is a cluster- distinct X -network. 

(2) If N is acyclic, then N/ Sm is acyclic. 

(3) Sm does not depend on the order in which the operations of Cluster- Distinct 
are performed. 

Proof. (1) Note that for x £ X, whenever a vertex u is merged with a leaf x, 
cl{u) = cl{x) = {x}. Hence the partition Sm is leaf-preserving. By Theorem 
3.1, N/Sm is an X-network. Note that if u and v are in B € Sm, then cl{u, N) = 
cl{v,N). It is easy to see that [u] G N/Sm satisfies cl{[u],N/Sm) — cl{u,N). 
To see that N/Sm is cluster-distinct, suppose {[u], [v]) is an arc oi N/Sm- Then 
there exist u' € [u] and v' G [v] with{u' ,v') an arc of TV. If cl{u',N) = cl{v',N) 
then by the algorithm [u] and [v] would be merged. Hence cl{[u],N/Sm) J^ 

cli[v],N/Sm). 

(2) Suppose that there were a directed cycle [u] — [uq], [ui], [^2], • • • , [uk] = 
[u] in N/Sm- Then for j = 0, ■ • • , fc — 1, there exist m'- and m" in [uj] such 
that {u'',u',^^) is an arc of N. It is immediate that if {w,v) is an arc in TV, 
then cl{w,N) contains cl{v,N). It follows that cI{uq,N) contains cl{u'i,N) = 
cl{u'{, N), which contains cl{u2, N) — cl{u2, N), • • • , which contains cl{u'f., N) = 
cI{uq, N). Hence all the clusters are the same whence algorithm Cluster-Distinct 
would merge them. Thus [uo] = [ui] = ■ - ■ = [uk-i] = [uk]- 

(3) When the algorithm terminates, Sm consists of the equivalence classes 
under the equivalence relation w obtained as follows: 

(a) First, define a relation '-^ on T^ such that if {u,v) is an arc, v ^ X, and 
cl{u, N) ~ cl(v, N), then u ^ v and v ^ u. 

(b) u « li; iff either u ~ w oi else there exists a sequence uq,ui,- ■ ■ ,Uk such 
that u = uq, Uk = w, and for i — 0,- ■ ■ , fc — 1, u^ ~ Ui+i. 

The equivalence classes clearly are independent of the order of operations. Hence 
(3) follows. D 

Corollary 5.2. There is a connected surjective X- digraph map (p : N -^ 
ClDis{N). Moreover, ClDis{N) has a wired lift into N. 

Proof. By induction, for all i, each member of Si is connected, whence each 
member of Sm is connected. The result follows from Theorem 4.1. D 

We call (j) the natural projection of A^ onto ClDis{N). 

Theorem 5.3. Let N = {V,A,r,X) and N' = {V',A',r',X) be X -networks. 
Let (j) : N ^- C'lDis(N) be the natural projection. Suppose f : N —> N' is a 
CSD map. Assume whenever [u, v) E A, v /: X , and cl{u, N) ~ cl{v, N) that it 
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follows that f{u) = f{v). Then there exists a unique CSD map g : ClDis{N) — > 
N' such that f = g o ((>■ 

Proof. Let V and Q be respectively the kernels of cf) and /. By hypothesis, P 
is subordinate to Q. By Theorem 3.5 the desired map g exists, and g is a CSD 
map since both / and (f> are connected. Uniqueness is immediate. D 

As a consequence, any wired lift of shows that N has structure mimicking 
that of C'lDis{N). Hence it may be natural to restrict attention in a given case 
to cluster-distinct networks. Such networks are typically much simpler than the 
initial networks and exhibit much of the essential structure. 

A map f : N -^ N' with kernel V is cluster- distinct if / is a CSD map 
and whenever {u,v) is an arc of N and cl(u,N) — cl(v,N) but f ^ AT, then 
f{u) = f{v). A cluster-distinct map f : N ^ N' is universal (for cluster- 
distinct maps) provided that given any cluster-distinct map g : N ^ N" there 
is a unique cluster-distinct map h : N' ^>- N" such that g — ho f. 

The essential content of Theorem 5.3 is that the natural projection map 
(j) : N ^ ClDis{N) is universal. More explicitly, we have the following corollary: 

Corollary 5.4. Let N = {V, A, r, X) be an X -network. Let(p:N^ ClDis{N) == 
N/Sm be the natural projection map where Sm is constructed by algorithm 
Cluster- Distinct. Then <j) is universal for cluster- distinct maps. 

Proof. Suppose g : N -^ N' is a cluster-distinct map. By Theorem 5.3, there 
exists a unique CSD map h : ClDis{N) — > N' such that g ~ h o cf). Since 
ClDis{N) is cluster-distinct, it is immediate that h is cluster-distinct. D 

Consider the network N in Figure 1. Then ClDis{N) is the tree N' shown 
in Figure 1. The image in N' of each vertex in N under the corresponding 
digraph map (f) is indicated by the label of each vertex of A^ in Figure 1. In 
general, however, ClDis{N) need not be a tree. 

The author believes that, when one is trying to reconstruct a network A'^ 
from data, it is reasonable to try to reconstruct ClDis{N) instead. The reason 
is that a great many properties of A^ are shared with ClDis{N). Corollary 5.4 
suggests that one might as well assume that A^ is already cluster-distinct. For 
a nontrivial example, in [10] a cluster C is called a tight cluster of A^ provided 
that C is nonempty and whenever there is an undirected path from c £ C to 
d e X — C, then there exists a vertex w on the path such that cl{w) — C. It is 
easy to show that a cluster C is a tight cluster of A^ if and only if it is a tight 
cluster oiClDis{N). 

There are several interesting variants of Algorithm Cluster-Distinct. One 
variant modifies step (2) so as never to identify a leaf with a parent having the 
same cluster. Thus we replace (2) by (2') as follows: 

(2') Repeat recursively the following if any such step can be performed: Given 
Si, suppose distinct Bi and B2 in Si satisfy that ui G -Bi, M2 £ -B2 (ui,M2) is 
an arc of A^, cl{ui,N) = cl{u2,N), and U2 is not a leaf of A^. Then Si+i := 
(5,-{Bi,B2})U{BiUB2}. 
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The advantage of (2') is that tree-child leaves do not become hybrid in 
ClDis{N). More generally, there are variants of Algorithm Cluster-Distinct so 
that, if Sm is computed in the modified manner, then in N/Sm many hybrid 
vertices will have outdegree 1. Further analyses of such networks may then yield 
more resolution than the results of the unmodified algorithm. 

6 Discussion 

This paper shows that the existence of a CSD map / from N to N' implies 
interesting relationships between TV and N' . By Theorems 3.4 and 3.5, CSD 
maps have good functorial properties; the composition of CSD maps is a CSD 
map, and certain CSD maps can be induced from other CSD maps. We have 
given a construction of a standard cluster-distinct network ClDis{N) such that 
there is a CSD map from N to ClDis{N). By Theorem 4.1, the CSD map 
implies the existence of a wired lift of N' into N . Such wired lifts show that 
some of the structure of A''' exists in iV as a "skeleton" . 

Since Darwin, trees have been the primary method to describe phylogenies. 
Now that hybridization and lateral gene transfer have been shown [6] , [9] to be 
important biologically, we need to consider other types of networks to be allowed 
in a useful analysis. The true network N containing each individual and all its 
progeny is the underlying reality, but such a network N is too complicated 
to allow reconstruction from extant taxa. A cartoon of such a network N is 
shown in Figure 1, in which A^' gives a plausible species tree for A^. In this 
case, N' ~ ClDis{N). In more complicated situations, however, the network 
ClDis{N) does not need to be a tree. 

This construction suggests that rather than allow all possible networks in 
our analysis, we might more usefully restrict our attention to cluster-distinct 
networks. After all, if the underlying reality is A^, then ClDis{N) exists and 
is much more susceptible of analysis. Moreover, since there is a wired lift of 
ClDis{N) into A^, there is hope of taking certain kinds of information about 
ClDis{N) and inferring its truth in A^. 

CSD maps exist whose images are trees. Of special interest, however, is the 
possibility that there might be other classes of networks more general than trees 
but not as general as cluster-distinct networks. For example, one might consider 
networks that are both cluster-distinct and tree-child [5]. Simple extensions of 
the results in this paper would lead to a CSD map from A^ to such a network 
and a wired lift of such a network into A^. There are many other possibilities. 

Future work should study more relationships between A^ and M if there is 
a CSD map from A^ to M, possibly with additional assumptions. 

Other relationships between networks have been proposed, such as a reduc- 
tion R{N) of the network A^ [15]. It is easy, however, to construct examples 
showing that there need not be a CSD map from A^ to R{N). 

This paper explicitly dealt with networks with vertex set V in which the 
set X of species was in one-to-one correspondence with the set of leaves via a 
one-to-one map (j) : X ^f V . A more general notion of an A-network requires 
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that the map </) need not be one-to-one and must only have image containing 
the set of leaves. In this situation most of the results go through with slightly 
different statements. A digraph map would require f{(j){x)) = 4>{x). 
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